Search CORE

32 research outputs found

Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

Author: A Owens
F Zhang
G Cheng
GS Xia
K Nogueira
Lvd Maaten
Q Zou
T Baltrušaitis
WL Zheng
Publication venue
Publication date: 15/07/2020
Field of study

Aerial scene recognition is a fundamental task in remote sensing and has recently received increased interest. While the visual information from overhead images with powerful models and efficient algorithms yields considerable performance on scene recognition, it still suffers from the variation of ground objects, lighting conditions etc. Inspired by the multi-channel perception theory in cognition science, in this paper, for improving the performance on the aerial scene recognition, we explore a novel audiovisual aerial scene recognition task using both images and sounds as input. Based on an observation that some specific sound events are more likely to be heard at a given geographic location, we propose to exploit the knowledge from the sound events to improve the performance on the aerial scene recognition. For this purpose, we have constructed a new dataset named AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE). With the help of this dataset, we evaluate three proposed approaches for transferring the sound event knowledge to the aerial scene recognition task in a multimodal learning framework, and show the benefit of exploiting the audio information for the aerial scene recognition. The source code is publicly available for reproducibility purposes.Comment: ECCV 202

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

Gazedirector: Fully articulated eye gaze redirection in video

Author: Baltrušaitis T
Bulling A
Morency LP
Robinson P
Wood E
Publication venue: Computer Graphics Forum
Publication date: 27/04/2017
Field of study

We present GazeDirector, a new approach for eye gaze redirection that uses model-fitting. Our method first tracks the eyes by fitting a multi-part eye region model to video frames using analysis-by-synthesis, thereby recovering eye region shape, texture, pose, and gaze simultaneously. It then redirects gaze by 1) warping the eyelids from the original image using a model-derived flow field, and 2) rendering and compositing synthesized 3D eyeballs onto the output image in a photorealistic manner. GazeDirector allows us to change where people are looking without person-specific training data, and with full articulation, i.e. we can precisely specify new gaze directions in 3D. Quantitatively, we evaluate both model-fitting and gaze synthesis, with experiments for gaze estimation and redirection on the Columbia gaze dataset. Qualitatively, we compare GazeDirector against recent work on gaze redirection, showing better results especially for large redirection angles. Finally, we demonstrate gaze redirection on YouTube videos by introducing new 3D gaze targets and by manipulating visual behavior

arXiv.org e-Print Archive

Apollo (Cambridge)

MPG.PuRe

Recommended from our members

Continuous conditional neural fields for structured regression

Author: Baltrušaitis T
Morency LP
Robinson P
Publication venue: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publication date: 01/01/2014
Field of study

Apollo (Cambridge)

Recommended from our members

Learning an appearance-based gaze estimator from one million synthesised images

Author: Baltrušaitis T
Bulling A
Morency LP
Robinson P
Wood E
Publication venue: Eye Tracking Research and Applications Symposium (ETRA)
Publication date: 01/01/2016
Field of study

Apollo (Cambridge)

MPG.PuRe

Investigating non-classical correlations between decision fused multi-modal documents

Author: A Aspect
A Pathak
A Tversky
AM Gleason
BS Cirel’son
CJ Rijsbergen Van
D Aerts
D Aerts
JF Clauser
M Grubinger
Massimo Melucci
N Gisin
PD Bruza
PD Bruza
PK Atrey
T Baltrušaitis
T Veloz
Y Hou
Y Hou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/10/2018
Field of study

Correlation has been widely used to facilitate various information retrieval methods such as query expansion, relevance feedback, document clustering, and multi-modal fusion. Especially, correlation and independence are important issues when fusing different modalities that influence a multi-modal information retrieval process. The basic idea of correlation is that an observable can help predict or enhance another observable. In quantum mechanics, quantum correlation, called entanglement, is a sort of correlation between the observables measured in atomic-size particles when these particles are not necessarily collected in ensembles. In this paper, we examine a multimodal fusion scenario that might be similar to that encountered in physics by firstly measuring two observables (i.e., text-based relevance and image-based relevance) of a multi-modal document without counting on an ensemble of multi-modal documents already labeled in terms of these two variables. Then, we investigate the existence of non-classical correlations between pairs of multi-modal documents. Despite there are some basic differences between entanglement and classical correlation encountered in the macroscopic world, we investigate the existence of this kind of non-classical correlation through the Bell inequality violation. Here, we experimentally test several novel association methods in a small-scale experiment. However, in the current experiment we did not find any violation of the Bell inequality. Finally, we present a series of interesting discussions, which may provide theoretical and empirical insights and inspirations for future development of this direction

arXiv.org e-Print Archive

Crossref

Open Research Online (The Open University)

Towards a Taxonomy of Cognitive RPA Components

Author: A Fernández
A Jimenez-Ramirez
AA Zaidan
BG Batchelor
HP Fung
JG Enríquez
L Willcocks
M Bkassiny
M Unterkalmsteiner
M Usman
MA Vasarhelyi
N Alaydie
P Weill
R-X Ding
RR Larson
S Shalev-Shwartz
SK Rainey
SPJ Wu
T Baltrušaitis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Robotic Process Automation (RPA) is a discipline that is increasingly growing hand in hand with Artificial Intelligence (AI) and Machine Learning enabling the so-called cognitive automation. In such context, the existing RPA platforms that include AI-based solutions clas sify their components, i.e. constituting part of a robot that performs a set of actions, in a way that seems to obey market or business deci sions instead of common-sense rules. To be more precise, components that present similar functionality are identified with different names and grouped in different ways depending on the platform that provides the components. Therefore, the analysis of different cognitive RPA platforms to check their suitability for facing a specific need is typically a time consuming and error-prone task. To overcome this problem and to pro vide users with support in the development of an RPA project, this paper proposes a method for the systematic construction of a taxonomy of cognitive RPA components. Moreover, such a method is applied over components that solve selected real-world use cases from the industry obtaining promising resultsMinisterio de Economía y Competitividad TIN2016-76956-C3-2-RJunta de Andalucía CEI-12-TIC021Centro para el Desarrollo Tecnológico Industrial P011-19/E0

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

Learning Tversky Similarity

Author: A Tversky
AW Smeulders
B Bouchon-Meunier
G Chechik
G Coletti
G Coletti
G Patterson
J Duchi
KQ Weinberger
M Baioletti
M Lesot
MM Richter
N Garcia
R Datta
S Lang
S Santini
SSM Salehi
T Baltrušaitis
Y Chen
Y Liu
YA Tolias
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/05/2020
Field of study

In this paper, we advocate Tversky's ratio model as an appropriate basis for computational approaches to semantic similarity, that is, the comparison of objects such as images in a semantically meaningful way. We consider the problem of learning Tversky similarity measures from suitable training data indicating whether two objects tend to be similar or dissimilar. Experimentally, we evaluate our approach to similarity learning on two image datasets, showing that is performs very well compared to existing methods

arXiv.org e-Print Archive

Crossref

Detecting human Activities Based on a multimodal sensor data set using a bidirectional long short-term memory model: a case study

Author: A Jaimes
AC Scheffer
Alexander Mathis
C Chen
E Kanjo
F Ordóñez
HF Nweke
J Wang
JC Núñez
JR Kwapisz
JY Hesterman
K Greff
L Ding
L Martínez-Villaseñor
M Ermes
M Schuster
MF Akay
N Neverova
R Igual
R Zhao
S Brownsell
S Chung
S Hochreiter
S Hochreiter
SW Sun
T Baltrušaitis
Y Bengio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/01/2020
Field of study

Human falls are one of the leading causes of fatal unintentional injuries worldwide. Falls result in a direct financial cost to health systems, and indirectly, to society’s productivity. Unsurprisingly, human fall detection and prevention is a major focus of health research. In this chapter, we present and evaluate several bidirectional long short-term memory (Bi-LSTM) models using a data set provided by the Challenge UP competition. The main goal of this study is to detect 12 human daily activities (six daily human activities, five falls, and one post-fall activity) derived from multi-modal data sources - wearable sensors, ambient sensors, and vision devices. Our proposed Bi-LSTM model leverages data from accelerometer and gyroscope sensors located at the ankle, right pocket, belt, and neck of the subject. We utilize a grid search technique to evaluate variations of the Bi-LSTM model and identify a configuration that presents the best results. The best Bi-LSTM model achieved good results for precision and f1-score, 43.30% and 38.50%, respectivel

Crossref

DCU Online Research Access Service

Learning an Appearance-based Gaze Estimator from One Million Synthesised Images

Author: Baltrušaitis T.
Bulling A.
Morency L.
Robinson P.
Wood E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

MPG.PuRe

A {3D} Morphable Eye Region Model for Gaze Estimation

Author: Baltrušaitis T.
Bulling A.
Morency L.
Robinson P.
Wood E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

MPG.PuRe